391 research outputs found

    Semantic-Based Destination Suggestion in Intelligent Tourism Information Systems

    Get PDF
    Abstract. In recent years, there has been a growing interest in mining trajectories of moving objects. Advances in this data mining task are likely to support the development of new applications such as mobility prediction and service pre-fetching. Approaches reported in the literature consider only spatio-temporal information provided by collected trajectories. However, some applications demand additional sources of information to make correct predictions. In this work, we consider the case of an on-line tourist support service which aims at suggesting places to visit in the nearby. We assume tourist interests depend both on her/his geographical position and on the “semantic ” information extracted from geo-referenced documents associated to the visited sites. Therefore, the suggestion is based on both spatio-temporal data as well as on textual data. To deal with tourist’s interest drift we apply a time-slice density estimation method. Experimental results are reported for two scenarios.

    Transductive hyperspectral image classification: toward integrating spectral and relational features via an iterative ensemble system

    Get PDF
    Remotely sensed hyperspectral image classification is a very challenging task due to the spatial correlation of the spectral signature and the high cost of true sample labeling. In light of this, the collective inference paradigm allows us to manage the spatial correlation between spectral responses of neighboring pixels, as interacting pixels are labeled simultaneously. The transductive inference paradigm allows us to reduce the inference error for the given set of unlabeled data, as sparsely labeled pixels are learned by accounting for both labeled and unlabeled information. In this paper, both these paradigms contribute to the definition of a spectral-relational classification methodology for imagery data. We propose a novel algorithm to assign a class to each pixel of a sparsely labeled hyperspectral image. It integrates the spectral information and the spatial correlation through an ensemble system. For every pixel of a hyperspectral image, spatial neighborhoods are constructed and used to build application-specific relational features. Classification is performed with an ensemble comprising a classifier learned by considering the available spectral information (associated with the pixel) and the classifiers learned by considering the extracted spatio-relational information (associated with the spatial neighborhoods). The more reliable labels predicted by the ensemble are fed back to the labeled part of the image. Experimental results highlight the importance of the spectral-relational strategy for the accurate transductive classification of hyperspectral images and they validate the proposed algorithm

    A novel spectral-spatial co-training algorithm for the transductive classification of hyperspectral imagery data

    Get PDF
    The automatic classification of hyperspectral data is made complex by several factors, such as the high cost of true sample labeling coupled with the high number of spectral bands, as well as the spatial correlation of the spectral signature. In this paper, a transductive collective classifier is proposed for dealing with all these factors in hyperspectral image classification. The transductive inference paradigm allows us to reduce the inference error for the given set of unlabeled data, as sparsely labeled pixels are learned by accounting for both labeled and unlabeled information. The collective inference paradigm allows us to manage the spatial correlation between spectral responses of neighboring pixels, as interacting pixels are labeled simultaneously. In particular, the innovative contribution of this study includes: (1) the design of an application-specific co-training schema to use both spectral information and spatial information, iteratively extracted at the object (set of pixels) level via collective inference; (2) the formulation of a spatial-aware example selection schema that accounts for the spatial correlation of predicted labels to augment training sets during iterative learning and (3) the investigation of a diversity class criterion that allows us to speed-up co-training classification. Experimental results validate the accuracy and efficiency of the proposed spectral-spatial, collective, co-training strategy

    Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction

    Get PDF
    BACKGROUND: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. RESULTS: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. CONCLUSIONS: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/ descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions

    DENCAST: distributed density-based clustering for multi-target regression

    Get PDF
    Recent developments in sensor networks and mobile computing led to a huge increase in data generated that need to be processed and analyzed efficiently. In this context, many distributed data mining algorithms have recently been proposed. Following this line of research, we propose the DENCAST system, a novel distributed algorithm implemented in Apache Spark, which performs density-based clustering and exploits the identified clusters to solve both single- and multi-target regression tasks (and thus, solves complex tasks such as time series prediction). Contrary to existing distributed methods, DENCAST does not require a final merging step (usually performed on a single machine) and is able to handle large-scale, high-dimensional data by taking advantage of locality sensitive hashing. Experiments show that DENCAST performs clustering more efficiently than a state-of-the-art distributed clustering algorithm, especially when the number of objects increases significantly. The quality of the extracted clusters is confirmed by the predictive capabilities of DENCAST on several datasets: It is able to significantly outperform (p-value <0.05<0.05 ) state-of-the-art distributed regression methods, in both single and multi-target settings

    ORANGE: Outcome-Oriented Predictive Process Monitoring Based on Image Encoding and CNNs

    Get PDF
    The outcome-oriented predictive process monitoring is a family of predictive process mining techniques that have witnessed rapid development and increasing adoption in the past few years. Boosted by the recent successful applications of deep learning in predictive process mining, we propose ORANGE, a novel deep learning method for learning outcome-oriented predictive process models. The main innovation of this study is that we adopt an imagery representation of the ongoing traces, which delineates potential data patterns that arise at neighbour pixels. Leveraging a collection of images representing ongoing traces, we train a Convolutional Neural Network (CNN) to predict the outcome of an ongoing trace. The empirical study shows the feasibility of the proposed method by investigating its accuracy on different benchmark outcome prediction problems in comparison to state-of-art competitor methods. In addition, we show how ORANGE can be integrated as an Intelligent Assistant into a CVM realized by MTM Project srl company to support sales agents in their negotiations. This case study shows that ORANGE can be effectively used to smartly monitor the outcome of ongoing negotiations by early highlighting negotiations that are candidate to be completed successfully

    Predictive modeling of PV energy production: How to set up the learning task for a better prediction?

    Get PDF
    In this paper, we tackle the problem of power prediction of several photovoltaic (PV) plants spread over an extended geographic area and connected to a power grid. The paper is intended to be a comprehensive study of one-day ahead forecast of PV energy production along several dimensions of analysis: i) The consideration of the spatio-temporal autocorrelation, which characterizes geophysical phenomena, to obtain more accurate predictions.ii) The learning setting to be considered, i.e. using simple output prediction for each hour or structured output prediction for each day. iii) The learning algorithms: We compare artificial neural networks, most often used for PV prediction forecast, and regression trees for learning adaptive models. The results obtained on two PV power plant datasets show that: taking into account spatio/temporal autocorrelation is beneficial; the structured output prediction setting significantly outperforms the non-structured output prediction setting; and regression trees provide better models than artificial neural networks

    Mining Multi-Relational Gradual Patterns

    Get PDF
    International audienceGradual patterns highlight covariations of attributes of the form " The more/less X, the more/less Y ". Their usefulness in several applications has recently stimulated the synthesis of several algorithms for their automated discovery from large datasets. However, existing techniques require all the interesting data to be in a single database relation or table. This paper extends the notion of gradual pattern to the case in which the co-variations are possibly expressed between attributes of different database relations. The interestingness measure for this class of " relational gradual patterns " is defined on the basis of both Kendall's Ď„ and gradual supports. Moreover, this paper proposes two algorithms, named Ď„ RGP Miner and gRGP Miner, for the discovery of relational gradual rules. Three pruning strategies to reduce the search space are proposed. The efficiency of the algorithms is empirically validated, and the usefulness of relational gradual patterns is proved on some real-world databases
    • …
    corecore